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Abstract 

It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that 
aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the 
genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM). In this paper, a linear 
regression (LR) model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to 
estimate the parameters of the LR model (LRBI). Comparative evaluations of LRBI with other two algorithms, the Adaptive 
Lasso (AL-Based) and the Sparsity-aware Maximum Likelihood (SML), are also presented. Simulations show that LRBI has 
significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI 
algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the 
LRBI algorithm is freely available upon request. 
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Introduction 

Exploring the structure of Gene Regulatory Networks (GRN) is 
a key element in understanding gene functions, especially in some 
complex diseases [1-3]. Direct experimental methods to explore 
the relationships among genes are time-consuming and labor- 
intensive. Statistical inference on GRN is a process of identifying 
gene interactions from limited experimental data using computa- 
tional analysis, and is much more efficient. 

Several models have been applied to describe the GRN. An 
intuitive and frequently applied method is to model the GRN as 
graphs [4-6], where the genes are considered as nodes and the 
interactions among them represented as edges. Several graphical 
methods, including directed acyclic graphs and directed cyclic 
graphs, have been proposed in [7-9] . GRN can also be modeled 
by the graphical Gaussian model [10], or the Bayesian network 
model [11]. Information theory, for instance, mutual information 
and synergy, can be also used to infer the GRN [12,13]. Due to 
high measurement cost of gene chip technology, only limited 
number of samples can be obtained. This limitation may result in 
low inference accuracy when applying synergy or mutual 
information to analyze the GRN. 

In the last decade, Structural Equation Modeling (SEM) [14] 
has been used to infer GRN [9,15,16]. Exploiting genetic 
perturbation data and gene expression data, the work in [16] 
used SEM model via an adaptive Lasso based algorithm (AL- 
Based) to infer the networks. With simulations, the authors showed 
that the AL-based method had better performance than all other 
existing methods. With the two same types of data, Cai et. al. 



introduced a sparse SEM model, and stated that their Sparsity- 
aware Maximum Likelihood (SML) algorithm significandy 
outperformed all other algorithms, including the AL-based one 
[17,18]. 

In this paper, we also study the gene regulatory networks with 
SEM model using both genetic perturbation data and gene 
expression data, and transfer the SEM to a Linear Regression (LR) 
model through matrix transformation. In this transformation 
process, regulatory information in GRN will not be lost. Instead of 
ML approaches or classic Lasso methods, we propose an approach 
to infer the networks via the LR model by using a Bayesian 
method (LRBI). Simulations show that our LRBI algorithm is 
effective and reliable, and offers significandy better performance 
than the AL-based algorithm. Compared with SML, LRBI has 
significandy better performance in terms of power of detection, but 
has slightly worse performance in false discovery rate. LRBI also 
has the advantages that the estimation of the initial parameters 
and the consideration of the data sensitivity are not needed. 

Model and Methods 

The LR model for gene network inference 

We consider m genes, n individuals' measurement using 
microarray. Without loss of generality, we assume that there are 
m makers. As in [9,16,17], the GRN obeys the form of SEM, 
where genes are the nodes, and interactions among genes are the 
edges, i.e. 
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P = BP + AX + e, 



(1) 



p{0 k ) =p(A k ,V k ) =p{V k )p{A k \V k ) 



(5) 



where P is an m x n matrix, p k j is the y'th expression level of the 
kth gene; B is an m x m matrix, defining the structure of the gene 
regulatory networks, b k j is the regulatory effect of the jth gene on 
the kth gene; Xisanmxn matrix, x k j is the genotype of the kth 
marker in the y'th perturbation; A is an m x m matrix representing 
the effect of each eQTL; e is an m x n matrix, and e k j is the jth 
measurement noise of the kth gene. All elements in e are 
independent and identically distributed (i.i.d). 

We assume that there is no self-loop of each gene, so that all 
diagonal entries of B are zeros. We also assume that each gene has 
its own corresponding QTL, and the loci of the m eQTLs have 
been determined by an existed method, but the effects of these 
eQTLs are unknown yet. Therefore A has m unknown entries, 
and all other entries are zeros. Without loss of generality, we 
assume that all the unknowns in A are the diagonal entries. 

With the predetermined eQTLs matrix X and the gene 
expression data P, the inference for GRN is to determine the 
unknown entries of B and A with appropriate optimization 
methods. 

Since all the unknown parameters are in (B,A), (1) can be 
written as follows 



p=nn+e 



where P is still the mxn matrix defined above, II = 



(2) 
(BA), 



. We further rewrite (2) to 
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(3) 



where Y k is the fcthrow of P, A k is the fcthrow of II, z k is the 
£throw of e. 

By the definition and the structure of (3), we can infer the 
parameters row by row. Therefore, the problem can be 
decomposed into 



Y k = A k Sl + E k ,k=l,2,...,m 



(4) 



In (4), the parameters that need to be inferred are 
Ay,ij=l,2,...,»i,i#y, and A, +m> ,-,/= l,2,...,m. 



Rewriting (5) leads to 

P{O k )=p{q> ek )p{A k \q> ek ) 



(6) 



We assume that (A k ,(p rk l ) has a joint prior distribution of 
Gaussian- Gamma [19], with 



Vsk ~ ■ Gamma(<xoek,PoEk) 



A k\p ek ~ Normal (A ok ,<p ek U 0yk ) 



(7) 



(8) 



where tXOeJti/W'Aofc are hyper parameters, should be preset to 
fixed values. Uo yk is a symmetric positive definite matrix. We will 
set it to an identity matrix in the implementation of algorithm for 
simplification. 

The likelihood is 

/.(Y^A^^^cclfel-lexp^- ^^(yjH-AtO,) 2 ^ (9) 



where Qi is the ith column of SI. The joint posterior distribution 
of (A k ,<p sk ) is proportional to the product of the prior and the 
likelihood 



p{A k ,(pJP,Sl) ocp{A k ,(p sk )p{Y k \A k ,(p dc ,Sl) 



(10) 



According to the prior distribution in (7~8) and the likelihood 
in (9), the joint posterior distribution (10) can be written as 

/ 7(A fc;fe |P,Q)cc^ < " /2+a °*- I) exp(-^ 0at ^ 1 ) 



-nil 

*<p A * exp 



;9,k 



\ 
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(ii) 



=p(<p^\v,si)*p(A k \<p A ?,a) 



Bayesian inference for the LR models 

In gene regulatory networks, most entries of A k are zeros, so A k 
is sparse. Therefore, we assume that all entries of A k follow 
Gaussian distribution with mean zeros. We also assume that 
entries of E k are i.i.d, and normally distributed with mean zeros 
and variance = q> Fk l, where I is an n x n identity matrix. 

With known SI, the parameters to be estimated in (4) are 
Q k = (Ai,T^). The joint prior distribution can be factorized as: 



where 



P (<Pek l p > a ) ~ Gamma [2 - 1 n + a 0r .k ,P, k ] ( 1 2 ) 



p(A k \¥,Sl) ~Normal[* k ,f A A k ] 



(13) 
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and 

A,= (H 0 -,l+nn T ) _1 (14) 

a,= [A,(H 0 -iAT+nYj)] T (15) 

/U = /W + 2-' (YtYj+AofcH^ASt -»*A t - 1 4) (16) 

More details of the derivation can refer to Text S 1 . A similar 
result was shown in [14,20], where (12 — 16) were parts of an 
iterative process to solve the Confirmatory Factor Analysis (CFA) 
model. 

We can sample the posterior distributions in (12) and (13) to 
constitute an iterative process. Since the values in (14) — (16) are all 
determined, the parameters of the Gamma distribution in (12), 
2~ 1 M + arj£zt and f}^, are all fixed. As a result, the posterior 
distribution of (p~ k will not be affected by the samples of A^. It is 
noted that only ip^ 1 ,Ak can be sampled, therefore, the iterative 
process may be not effective and accurate . Thus, we modify the 
calculations of &k,P s k m (15) and (16), and substitute Aojt by A^. 

a*=[A,(H 0 -X + QYT)] T (17) 

/^ = Am + 2-' (YjfcYj+AtH^-.^ 1 ^) (18) 

The combination of (12, 13) and (17, 18) forms an iterative 
process. We execute this iterative process with a sufficient number 
of times, and until a steady state is reached. A sequence of sets of 
A^ are obtained by sampling from the posterior distribution in 
(13), which are then averaged to get the estimated parameters of 
Afc. To get accurate results, we must guarantee that the iteration 
reaches its steady state. A simple stopping condition is to test the 
value of the square difference of the inferred parameters between 

two successive iterations, i.e. Yl + ^i?) ■ ^ me difference 
is small enough (say T< 0.001), the iteration has reached a stable 
state. The choice of T can influence the accuracy. The smaller z is, 
the higher the accuracy of the parameter approximation is, 
naturally at the cost of more iterations and increased computa- 
tional time. 

The sketch of an algorithm is as follows: 

Input the eQTLs matrix X, and the gene expression data P; Set the initial 
hyperparameters txoafc=w/5, /?oafc = 1 > Howfc = I- Ao/t is set to a 
1 * Imvector, where only the kth entry is 1, all other entries are zeros. 
k= \,2,...,m; Assign a small value to x. 

For k= 1,2,. ..,m 

Calculate Ak,2ik,P e ,kb> (14~16); 

Repeat: 

1. Get the sample of f^. 1 from the Gamma distribution by (12); 

2. Get the sample of A^ from the Normal distribution by (13); 

3. Calculate & k ,P,k by (17) (18); 

4. Calculate S= £ (a£ +1) - A®) ; 



5. If S <T,then end the iteration, else go to step 1; 
End for 

More details of the algorithm implementation can refer to the 
software package Fl. 

It should be noted that it would be better to choose the second 
half of the samples and average them to get accurate result. The 
reason is that at the beginning of the iteration, the gap between the 
estimated A® and the true values is large. 

Results 

Simulations 

Let Ne be the number of edges in B (the original network), Nje 
be the number of edges in B (the inferred network), Nj alse be the 
number of edges which exist in B but not in B, N lrue be the 
number of edges which exist in both B and B, therefore 
N 'false + N 't rU e = N iE- Define Power of Detection PD = N' tru J N E , 
and False Discovery Rate FDR = N , fahe N IE . 

Logsdon and Mezey [16] had shown that the AL-based 
algorithm outperformed the PC-algorithm [21,22], the QTLnet 
algorithm [23], and had comparable performance with the QDG 
algorithm [24]. Cai et al. stated that their SML algorithm offered 
significandy better performance than the AL-based algorithm and 
the QTL algorithm in PD and FDR [17,18]. Therefore, we shall 
compare our LRBI algorithm with SML and AL-based. 

Firstiy, we carried out simulations following the setups in [16]. 
We simulated two types of directed acyclic gene networks: one 
with 10 genes and the other one with 30 genes. Averaged N e = 3 
edges were created per gene, which meant that there were on 
average 3 edges created between one gene and all other genes. If 
an edge existed from node j to node i, then by was sampled from a 
uniform distribution on the interval ( — 1 —0.5) U (0.51); other- 
wise by was set to 0. Entries of X took values from the set 
{l,2,3}with the corresponding probabilities 0.25, 0.5, and 0.25 
respectively. Each gene has its own corresponding QTL, and A is 
assumed to be an identity matrix. Each entry of e in (1) was 
sampled from a Gaussian distribution A(0,0.01). P was calculated 
by (1). 

We generated cyclic or acyclic networks for simulations, and 
used LRBI to infer the parameters of the simulated networks. For 
cyclic networks,LRBI can obtain the steady-state solutions 
naturally. By inference, the steady regulatory relations can be 
got, if some cyclic regulatory relations among genes existed. 

Due to the inference characteristic of Bayesian methods, the 
estimated parameters are not regressed to zeros as in Lasso 
methods. Therefore, an edge from gene j to gene i is considered to 

be present if |^-| >0.05, otherwise, there is no edge from gene j to 

gene i. 

Simulation results for the setups described above are shown in 
Figure 1, where (a) and (b) are for the gene network of m = 10, (c) 
and (d) are for the gene network of m = 30. LRBI has a better 
performance than SML in terms of PD, but SML outperforms 
LRBI algorithm in terms of FDR. Both LRBI and SML 
significandy outperform the AL-Based algorithm in terms of PD 
and FDR. The PD of LRBI reaches 1 when the number of 
samples is 20 or more for both the two scenarios m=10 and 
m = 30. 

Secondly, we simulated two types of directed cyclic gene 
networks: one with 10 genes and the other one with 30 genes. 
Averaged N e = 3 edges were created per gene. We employed the 
same procedure used in the acyclic scenario to generate 
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(a) Directed acyclic network 1 Ng=10,sigma2=0.01;Ne=3,emso=0.05 
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(b) Directed acyclic network, Ng=10,sigma2=0.01;Ne=3,emso=0.05 
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(c) Directed acyclic network, Ng=30,sigma2=0.01;Ne=3,emso=0.05 



(d) Directed acyclic network, Ng=30,sigma2=0.01;Ne=3,emso=0.05 
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Figure 1. Performance of LRBI for directed acyclic networks. The performance of SML and AL-Based algorithms is also shown for 
comparisons. The average number of edges per node is iV c = 3, the variance of noise is 0.01, and no edge exists if Lb). ■ <0.05 for decision, (a) and (b) 
are for a gene network of m = 10, (c) and (d) are for a gene network of m = 30. 
doi:1 0.1 371 /journal.pone.0083263.g001 



B,A,X,P,e. Simulation results are shown in Figure 2, where (a) 
and (b) are for the gene network of in = 1 0, (c) and (d) are for the 
gene network of in = 30. LRBI has significantly better performance 
than SML in terms of PD, and SML outperforms LRBI algorithm 
in terms of FDR. When the number of samples is large enough, 
the FDRs of LRBI and SML are all close to zeros. Both LRBI and 
SML significandy outperform the AL-Based algorithm in PD and 
FDR. 

Thirdly, we simulated the impact of different decision thresholds 
on performances. We used a bigger network m= 100. Averaged 
N e = 3 edges were created per gen, and the variance of noise was 
0.01. Three decision thresholds if = 0.05,0.1,0.2 were simulated. 

In simulations, if we found that |i'^-|<^, then we set l> t j = Q. A 

directed acyclic network and a directed cyclic network were 
simulated and the results were separately shown in Figure 3 (a) (b) 
and Figure 3 (c) (d). 

We continued the simulations with a even bigger network with 
N e = 3, in = 300 to study the impact of decision thresholds on 
performance. The variance of noise was 0.01. Two decision 
thresholds £ = 0.1 and £ = 0.2 were simulated respectively. Both 
directed acyclic network and directed cyclic network were 
simulated, and the results were separately shown in Figure 4 (a) 
(b) and Figure 4 (c) (d). As confirmed by Figure 3 and Figure 4, a 
large decision threshold can reduce the FDR, but it also lowers the 
PD. Therefore, the decision thresholds used in simulations or 
applications should be chosen carefully. 

Finally, we evaluated the impact of noise levels on the 
performance of LRBI. Here, we used networks with »2=100, 
N e = 3. Again, we applied LRBI to both directed acyclic and 
directed cyclic networks. The variance of noise was set to 0.01, 
0.05, and 0.1 respectively. The simulation results are shown in 
Figure 5, where (a) and (b) are for directed acyclic network, (c) and 



(d) are for directed cyclic network. We find that the PD 
performance is always excellent, but the FDR of LRBI is worse 
when the noise level increases, even when the number of samples is 
relatively large. 

We have stated that LRBI cannot infer parameters to zeros 
automatically, but can infer them with high precision. In most of 
the simulations we conducted, the decision thresholds are 0.05. 
That is to say, if the value inferred is lower than 0.05, the entry is 
considered to be zero. This implies that the numerical difference 
between the inferred value and the original value is less than 0.05 
for most of the entries in regulatory networks. We define that 

numerical difference as INEr(iJ) = Ibg — fc'J. Through simula- 
tions, we found that INEr was also very small for the entry whose 
value is nonzero in the original network. This feature is very 
meaningful, because the inferred parameters can accurately 
indicate the regulatory relationships among genes. An acyclic 
network was simulated, with m = 30,7V c = 3, sigma2 = 0.01, deci- 
sion threshold £ = 0.1. Some results are shown in Table 1. 

Case study 

Here, we applied LRBI to infer the gene regulatory networks 
using the gene expression data and the genetic makers, which were 
assayed in 112 segregants of a cross between the yeast strains 
BY4716 and RM1 1-la [25]. The cross had 5727 genes with smaU 
number of samples, so a pretreatment process was needed to select 
strong cis-eQTLs and interactions among genes [16]. We dealt 
with the filtered dataset provided by Logsdon [16], in which only 
35 genes were used. The 35 yeast genes are SEOl, NUP60, 
RCY1, IRC18, TPK3, PHD1, JLP1, SNF7, PCD1, RPL19A, 
SEN1, OST6, BUB2, BUL1, PHA2, ORC5, FYV6, SLM2, 
HAL9, RDL1, POC4, ASM, ECM13, TYR1, RNQJ, SFA1, 
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(a) Directed cyclic network, Ng=10,sigma2=0.01;Ne=3,emso=0.05 



(b) Directed cyclic network, Ng=10,sigma2=0.01;Ne=3,emso=0. 05 
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(c) Directed cyclic network,Ng=30,sigma2= : 0.01;Ne=3,emso=0.05 
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(d) Directed cyclic network, Ng=30,sigma2=0. 01 ;Ne=3,emso=0. 05 
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Figure 2. Performance of our LRBI algorithms for directed cyclic networks. The performance of SML and AL-Based algorithms is also shown 
for comparisons. The average number of edges per node is N e =3, the variance of noise is 0.01, and no edge exists if \b\j <0.05 for decision, (a) and 
(b) are for a gene network of m= 10, (c) and (d) are for a gene network of m = 30. 
doi:1 0.1 371 /journal.pone.0083263.g002 



(a) Directed acyclic network, Ng=300,sigma2=0. 01 ;Ne=1 (b) Directed acyclic network, Ng=300,sigma2=0. 01 ;Ne=1 
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(c) Directed cyclic network, Ng=300,sigma2=0. 01 ;Ne=1 (d) Directed cyclic network, Ng=300,sigma2=0. 01 ;Ne=1 
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Figure 3. Performance of LRBI algorithms for various decision thresholds. Two network cases are simulated to find the impact of decision 
thresholds on PD and FDR. (a) and (b) are for directed acyclic networks, (c) and(d) are for directed cyclic networks; m= 100,JV e = 3, the variance of 
noise is 0.01. Thresholds are 0.05, 0.1, 0.2 (emso in figure). 
doi:1 0.1 371 /journal.pone.0083263.g003 
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(a) Directed acyclic network. Ng=1 00, sigma2=0. 01 ;Ne=3 



(b) Directed acyclic network, Ng=1 00, sigma2=0. 01 ;Ne=3 
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(c) Directed acyclic network, Ng=100,sigma2=0.01;Ne=3 
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(d) Directed acyclic network, Ng=1 00, sigma2=0. 01 ;Ne=3 




emso=0.05 

■ emso=0.1 

■ emso=0.2 



300 400 500 

Number of Samples 



Figure 4. Performance of LRBI algorithms for various decision threshold with m = 300. Two network cases are simulated to find the impact 
of decision thresholds on PD and FDR. (a) and (b) are for directed acyclic networks, (c) and(d) are for directed cyclic networks; m = 300, N e = 3, the 
variance of noise is 0.01 Thresholds are 0.1 and 0.2. 
doi:10.1371/journal.pone.0083263.g004 



(a) Directed acyclic network, Ng=100;Ne=3, emso=0.05 
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(b) Directed acyclic network, Ng=100;Ne=3, emso=0.05 
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(c) Directed cyclic network, Ng=100;Ne=3, emso=0.05 
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(d) Directed cyclic network, Ng=100;Ne=3, emso=0.05 
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Figure 5. Performance of LRBI algorithm under various noise levels. Two network cases are simulated to find the impact of noise level on PD 
and FDR. (a) and (b) are for directed acyclic networks, (c) and (d) are for directed cyclic networks; «7 = 100, Decision thresholds is 0.05. Three noise 
levels are simulated. 
doi:1 0.1 371 /journal.pone.0083263.g005 
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Table 1. Some INErs of a network inferred by LRBI. 
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doi:1 0.1 371 /journal.pone.0083263.t001 

PRM7, SAN1, HIM1, YEL073C, SAP1, SNZ3, MST27, 
YHR054C, DAL7. 

With 112 samples for these 35 genes, and the eQTLs data, we 
inferred the regulatory network as shown in Figure 6. It is noted 
that our algorithm doesn't need to assume the network is cyclic or 
acyclic. There are 145 edges in the inferred network. A total of 31 
genes are regulators of at least one target, and 32 genes have at 



least one regulator. A total of 28 genes occur both as regulators 
and targets. 

There were only 4 instances of reciprocal regulation (two 

-0.5113 

genes act on each other) presented: ORC5 ^ SNF7, 

-0.6901 

-0.4259 -0.3952 0.3480 

CM13 *± YL14A, YHL4 ^ RDL1, DAL7 *=> HIM1. 

-0.3897 -0.4000 0.4959 




Figure 6. Regulatory network reconstruction for the 35 genes. These genes are filtered out by the methods as in [16]. In the figure, a solid line 
denotes that the interaction between two genes is positive regulatory, while a dotted line denotes a negative regulatory. Some color lines are used to 
make the figure clear. There are 145 regulator-target pairs, among which, 78 pairs are positive regulations, and 67 pairs are negative regulations. 
doi:1 0.1 371 /journal. pone.0083263.g006 



PLOS ONE | www.plosone.org 



7 



December 2013 | Volume 8 | Issue 12 | e83263 



Inference of GRN with Linear Regression Model 



Among the 145 regulator-target pairs, there are 78 positive 
regulations, and 67 negative regulations. To verify the inference 
result, we used the Generate Regulation Matrix tool in the website 
of YEASTRACT [26] to create the gene regulatory network with 
the 35 selected yeast genes described above. In the network 
generated by the tool, there are only three regulatory relationship, 
HIM1 regulated by HAL9 [27], YEL073C regulated by PHD1 
[27], and FYV6 regulated by PHD1 [28]. From the experimental 
yeast data, we deduced two out of the three regulatory 
relationships, HIM1 regulated by HAL9, YEL073C regulated by 
PHD1. 

Conclusion and Discussion 

We modeled the gene regulatory networks by using a LR model, 
and proposed a Bayesian method to complete the inference. We 
conducted a series of simulations to evaluate the performance of 
the proposed algorithm LRBI, and compared LRBI with another 
two algorithms, the AL-Based and the SML algorithms. LRBI had 
a significantly better performance than AL-based regarding to 
both PD and FDR. Compared to SML, LRBI showed a better 
performance in PD and slightly worse in FDR. This feature of 
LRBI makes more sense. Considering two cases, one is that we can 
find less false edges but loss more true edges, the other one is that 
we can find more, or even all true edges among genes, but with 
slightly more false edges, the latter one is more meaningful. 

The proposed algorithm was accurate, and the gap between the 
inferred and the original parameters was less than 5% (even 2%) in 
most case. The proposed algorithm was also very effective. We 
inferred the GRN of the 35 yeast genes in a short time (1 .2 seconds 
in a laptop), while for the SML algorithm, a program error 
occurred after about 52 minutes' run with the same 35 yeast genes 
data set. LRBI also had the benefit that the dependency of the 
performance on the estimates of initial parameters is not strong. 
For simplicity, we just assign some constants to these parameters in 
simulations and case studies. Therefore, the LR model and the 
LRBI algorithm can provide an effective way of exploiting both 
gene expression and perturbation data to infer GRN. 
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